As a Senior Machine Learning Ops Engineer, you will design and build large-scale architectures, workflows, tools, and automation for processing data and apply machine learning engineering to solve global business challenges with a focus on making tasks easier for data scientists.
Job listings
This role involves architecting and maintaining robust, scalable, and secure infrastructure solutions. You'll collaborate closely with cross-functional teams to streamline our deployment processes and enhance system reliability. Design, implement, and manage AWS infrastructure. Develop and maintain Kubernetes clusters, ensuring high availability and scalability. Implement and manage CI/CD pipelines using GitHub Actions. Monitor system performance, troubleshoot issues, and ensure system reliability and security.
As a Senior Site Reliability Engineer, you will help enhance the stability, performance, and observability of platforms, focusing on maintaining and optimizing the current infrastructure and ensuring strong monitoring coverage. You will also support compliance and security practices and collaborate closely with development teams to supervise the platforms, optimize system behavior, and drive improvements in security and documentation practices.